11 research outputs found

    Improving Network Performance Through Endpoint Diagnosis And Multipath Communications

    Get PDF
    Components of networks, and by extension the internet can fail. It is, therefore, important to find the points of failure and resolve existing issues as quickly as possible. Resolution, however, takes time and its important to maintain high quality of service (QoS) for existing clients while it is in progress. In this work, our goal is to provide clients with means of avoiding failures if/when possible to maintain high QoS while enabling them to assist in the diagnosis process to speed up the time to recovery. Fixing failures relies on first detecting that there is one and then identifying where it occurred so as to be able to remedy it. We take a two-step approach in our solution. First, we identify the entity (Client, Server, Network) responsible for the failure. Next, if a failure is identified as network related additional algorithms are triggered to detect the device responsible. To achieve the first step, we revisit the question: how much can you infer about a failure using TCP statistics collected at one of the endpoints in a connection? Using an agent that captures TCP statistics at one of the end points we devise a classification algorithm that identifies the root cause of failures. Using insights derived from this classification algorithm we identify dominant TCP metrics that indicate where/why problems occur. If/when a failure is identified as a network related problem, the second step is triggered, where the algorithm uses additional information that is collected from ``failed\u27\u27 connections to identify the device which resulted in the failure. Failures are also disruptive to user\u27s performance. Resolution may take time. Therefore, it is important to be able to shield clients from their effects as much as possible. One option for avoiding problems resulting from failures is to rely on multiple paths (they are unlikely to go bad at the same time). The use of multiple paths involves both selecting paths (routing) and using them effectively. The second part of this thesis explores the efficacy of multipath communication in such situations. It is expected that multi-path communications have monetary implications for the ISP\u27s and content providers. Our solution, therefore, aims to minimize such costs to the content providers while significantly improving user performance

    A Distributed Routing Protocol for Predictable Rates in Wireless Mesh Networks

    Get PDF
    Wireless mesh networks hold the promise of rapid and flexible deployments of communication facilities. This potential notwithstanding, the often erratic behavior of multihop wireless transmissions is limiting the range of applications that such networks can target. In this paper we investigate the feasibility and benefits of a routing protocol explicitly aimed at making wireless mesh networks more predictable while preserving their efficiency and flexibility. The protocol\u27s basic premise is the classical idea that a multipath solution can offer resiliency to unexpected link variations. The paper\u27s contributions are in demonstrating how this can be effectively realized in a wireless context, and in offering initial evidences of its efficacy. In particular, the paper illustrates how routing decisions that account for link variability can be computed in a distributed fashion, and the benefits they afford in improving the stability of end-to-end transmission rates even in the presence of random network fluctuations

    Mitigating the Performance Impact of Network Failures in Public Clouds

    Full text link
    Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWARM, the approach described in this paper, can pick orders of magnitude better mitigations by estimating end-to-end connection-level performance (CLP) metrics. At its core is a scalable CLP estimator that quickly ranks mitigations with high fidelity and, on failures observed at a large cloud provider, outperforms the state-of-the-art by over 700×\times in some cases
    corecore